Voice Wake Activation - Implementation Guide
Overview
Voice wake activation allows users to control ATOM agents using natural voice commands through the menu bar companion.
**Current Status**: Foundation Complete
- ✅ Backend voice command processor
- ✅ Frontend voice client
- ✅ Voice feedback UI
- ⏳ Native speech recognition (platform-specific)
---
Architecture
Components
- **Voice Command Processor** (Python)
- Natural language intent classification
- Command routing and execution
- Agent integration
- **Voice Commands Client** (TypeScript)
- Speech Recognition API wrapper
- WebSocket communication
- React hooks for UI integration
- **Voice Feedback UI** (React)
- Listening state indicator
- Transcript display
- Audio waveform visualization
- **Native Integration** (Rust/Tauri)
- Platform-specific speech recognition
- Wake word detection
- Audio capture
---
Current Implementation
Backend (Complete)
Voice Command Processor
from core.voice_command_processor import VoiceCommandProcessor
processor = VoiceCommandProcessor(db)
result = await processor.process_voice_command(
tenant_id="tenant-123",
agent_id="agent-456",
command="Open dashboard",
confidence=0.95
)API Endpoint
POST /api/voice/command
Content-Type: application/json
Authorization: Bearer <token>
{
"command": "Execute sales report",
"confidence": 0.92,
"agent_id": "agent-456"
}Frontend (Complete)
Voice Command Client
import { useVoiceCommands } from '@/lib/voice-commands';
function VoiceControl() {
const { isListening, transcript, startListening, sendCommand } =
useVoiceCommands('tenant-123');
return (
<>
<button onClick={startListening}>Start Listening</button>
{isListening && <p>Listening...</p>}
{transcript && <p>You said: {transcript}</p>}
</>
);
}Voice Feedback Component
import { VoiceFeedback } from '@/components/menu-bar/VoiceFeedback';
<VoiceFeedback
isListening={true}
transcript="Open dashboard"
response="Opening dashboard now"
/>---
Platform-Specific Implementation
macOS (Recommended for First Release)
Native Speech Recognition
// src-tauri/src/voice_wake_macos.rs
use cocoa::base::id;
use cocoa::foundation::NSString;
pub struct MacOSVoiceWake {
recognizer: id, // SFSpeechRecognizer
is_listening: bool,
}
impl MacOSVoiceWake {
pub fn new(wake_word: &str) -> Self {
// Initialize SFSpeechRecognizer
// Request microphone permissions
// Configure for continuous listening
Self {
recognizer: create_speech_recognizer(),
is_listening: false,
}
}
pub fn start_listening(&mut self) {
// Start real-time speech recognition
// Detect wake word in audio stream
}
}Permissions
<!-- src-tauri/tauri.conf.json -->
<macOSPrivatePermissions>
<key>microphone</key>
<key>speech-recognition</key>
</macOSPrivatePermissions>Windows (Future)
Windows Speech Platform
// src-tauri/src/voice_wake_windows.rs
use windows::Win32::Media::Speech::*;
pub struct WindowsVoiceWake {
recognizer: ISpeechRecognizer,
}
impl WindowsVoiceWake {
pub fn new(wake_word: &str) -> Self {
// Initialize Windows Speech Platform
Self {
recognizer: create_speech_recognizer(),
}
}
}Linux (Future)
Pocketsphinx Integration
// src-tauri/src/voice_wake_linux.rs
use pocketsphinx::*
pub struct LinuxVoiceWake {
decoder: ps_decoder_t,
}
impl LinuxVoiceWake {
pub fn new(wake_word: &str) -> Self {
// Initialize Pocketsphinx
// Load acoustic model
Self {
decoder: create_decoder(),
}
}
}---
Supported Commands
Command Types
1. Agent Execution
"Hey ATOM, execute reconcile inventory"
"ATOM, run the sales report"
"Send a summary email to John"2. Dashboard Navigation
"Open my dashboard"
"Go to dashboard"
"Show me the dashboard"3. Status Queries
"Check agent health"
"What's the status?"
"How are the agents doing?"4. Quick Actions
"Pause all agents"
"Enable autonomous mode"
"Show recent interventions"---
Usage Flow
User Interaction
- **Wake Word Detection**
- **Command Listening**
- **Intent Classification**
- **Action Execution**
- **Feedback**
---
Configuration
Wake Word Settings
const voiceConfig = {
wakeWord: "Hey ATOM",
sensitivity: 0.7, // 0-1, higher = more sensitive
language: "en-US",
continuous: false, // Stop after one command
requireConfirmation: true // Ask for confirmation on sensitive actions
};Command Settings
const commandConfig = {
confidenceThreshold: 0.7, // Minimum confidence to process
timeout: 5000, // Command timeout (ms)
maxRetries: 3, // Maximum recognition retries
};---
Permissions & Security
Required Permissions
macOS
microphone- Audio capturespeech-recognition- Speech processing
Windows
- Microphone access
- Speech recognition access
Linux
- Audio input device
- Speech recognition library
Security Measures
- **Local Processing**
- Wake word detected locally
- No audio sent to cloud until wake word
- **Authentication**
- All commands authenticated via JWT
- Tenant-isolated command execution
- **Audit Logging**
- All voice commands logged
- Include transcript, timestamp, result
- **Confirmation Required**
- Sensitive commands require confirmation
- User can review before execution
---
Error Handling
Common Errors
Error: Speech Recognition Not Supported
if (!client.isSupported()) {
showNotification(
"Voice commands not supported in this browser",
"error"
);
}Error: Low Confidence Recognition
if (confidence < 0.7) {
return {
success: false,
message: "I didn't catch that. Could you repeat?",
requires_confirmation: true
};
}Error: Microphone Access Denied
navigator.mediaDevices.getUserMedia({ audio: true })
.catch(err => {
showNotification(
"Microphone access denied. Please enable in browser settings.",
"error"
);
});---
Testing
Manual Testing
- **Test Wake Word**
- **Test Commands**
- **Test Low Confidence**
- **Test Background Noise**
Automated Tests
// Test voice command classification
describe('Voice Command Processor', () => {
it('should classify dashboard command', async () => {
const result = await processor.process_voice_command(
tenant_id,
agent_id,
"Open dashboard",
0.95
);
expect(result.action).toBe('open_url');
});
it('should reject low confidence commands', async () => {
const result = await processor.process_voice_command(
tenant_id,
agent_id,
"Mumble mumble",
0.5
);
expect(result.success).toBe(false);
});
});---
Performance
Latency Targets
| Operation | Target | Actual |
|---|---|---|
| Wake word detection | < 500ms | TBD |
| Speech to text | < 1s | TBD |
| Intent classification | < 500ms | TBD |
| Command execution | < 2s | TBD |
| End-to-end | < 3s | TBD |
Optimization Strategies
- **Local Wake Word Detection**
- Reduce cloud dependency
- Faster response time
- **Intent Caching**
- Cache common command intents
- Faster classification
- **Audio Processing**
- Downsample audio for faster processing
- Noise reduction for better accuracy
---
Future Enhancements
Phase 2 (Q2 2025)
- [ ] Windows speech recognition
- [ ] Linux speech recognition (Pocketsphinx)
- [ ] Custom wake word training
- [ ] Voice biometrics for security
Phase 3 (Q3 2025)
- [ ] Multi-language support
- [ ] Context-aware commands
- [ ] Command history and replay
- [ ] Voice shortcuts/macros
Phase 4 (Q4 2025)
- [ ] Natural conversation mode
- [ ] Follow-up questions
- [ ] Multi-turn dialogues
- [ ] Voice-activated workflows
---
Troubleshooting
Issue: Wake word not detected
**Solutions**:
- Check microphone permissions
- Adjust sensitivity setting
- Speak closer to microphone
- Reduce background noise
Issue: Commands misclassified
**Solutions**:
- Speak clearly and at moderate pace
- Use exact command phrases
- Check confidence score
- Review command logs
Issue: High error rate
**Solutions**:
- Train custom wake word model
- Adjust sensitivity threshold
- Improve microphone quality
- Reduce ambient noise
---
References
---
**Status**: Foundation Complete
**Platform Support**: macOS (ready), Windows (planned), Linux (planned)
**Production Ready**: With macOS native integration